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REMARKS 

Claims 1, 8-16, 60, 77 and 78 have been amended. Claims 3-7, 19-59, 61-63, 66-74 
and 76 have been canceled without prejudice or disclaimer. Subsequent to the entry of the 
present amendment, claims 1, 2, 8-16, 17, 18, 60, 64, 65, 75, 77, 78, 79 and 80 are pending and 
at issue. These amendments and additions add no new matter as the claim language is fiiUy 
supported by the specification and original claims. 

Applicants note that some of the claims as amended recite a single chain antibody 
having the amino acid sequence set forth in SEQIDN0:2. The sequence listing filed April 
27, 2001 clearly indicates that the amino acid sequence of SEQ ID N0:2 is encoded by the 
nucleic acid sequence of SEQ ID NO: 1 . Reconsideration of the application in Ught of the 
foregoing amendments and the following discussion is respectfully requested. 

Claim Objections 

Claims 9-16 have been objected to as depending from a canceled claim. The claims 
have been amended to depend from a pending claim. Thus, this rejection is now moot. 

The Rejection under 35 U.S,C. $ 112, First Paragraph. Enablement 

Claims 1-2, 5-6, 8-18, 60, 63-65, 75-76 and 79-80 stand rejected under 35 U.S.C. 
§112, first paragraph as allegedly not enabled by the specification as filed. This rejection is 
moot with regard to canceled claims 5-6, 63 and 76. Applicant respectfully traverses the 
rejection as it may apply to the amended claims. 

The Office Action acknowledges that the present specification enables a variety of 
embodiments, as listed in points 1 to 15, starting on page 2, item 4, of the Office Action, 
including methods that include a single chain antibody encoded by SEQ ID NO: 1 that binds 
specifically to phOx and a linker coupling a probe to a ligand such as phOx. However, the 
Office Action alleges that the specification does not enable a method that includes any single 
chain antibody, including a single chain antibody that has at least 30% sequence identity to 
SEQ ID N0:1, that binds to any ligand (emphasis in original). Claims 1 and 60 have been 
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amended to recite a single chain antibody that specifically binds to phOx. In addition, claims 1 
and 60 have been amended to recite a ligand comprising phOx. The present disclosure 
provides working examples of methods for locaUzing a probe that includes an antibody that 
binds phOx. The specific single chain antibody/ligand combination fimctions in the method to 
localize the probe to the vicinity of the ligand, allowing visuaHzation of the ligand. 
Accordingly, the claimed methods should not be limited to solely to the use of a single chain 
antibody having the amino acid sequence set forth in SEQ ED N0:2. It would be a matter of 
routine experimentation, not undue experimentation, for one skilled in the art to identify a 
single chain antibody that specifically binds to phOx and possesses an amino acid sequence 
different from SEQ ID N0:2. 

Regarding undue experimentation, as stated by the Patent Office, the Federal Circuit in 
In re Wands directed that the focus of the enablement inquiry should be whether the 
experimentation needed to practice the invention is or is not "undue" experimentation. The 
court set forth specific factors to be considered. 

One of these factors is "the quantity of experimentation necessary." Guidance as to 
how much experimentation may be needed and still not be "undue" is set forth by the Federal 
Circuit in, e.g., Hvbritech. Inc. v. Monoclonal Antibodies, Inc . In that case, an applicant had 
claims that were generic to all IgM antibodies directed to a specific antigen. However, only a 
single antibody producing cell line had been deposited. The PTO had rejected claims that 
were generic to all antibodies directed to the antigen as lacking an enabling disclosure. 

The Federal Circuit reversed, noting that the evidence indicated that those skilled in the 
monoclonal antibody art could, using the state of the art and applicants' written disclosure, 
produce and screen new hybridomas secreting other monoclonal antibodies falling within the 
genus without undue experimentation. The court held that applicants' claims need not be 
limited to the specific, single antibody secreted by the deposited hybridoma cell line 
(significantly, the genus of antibodies was allowed even though only one antibody species was 
disclosed). The court was acknowledging that, because practitioners in that art are prepared to 
screen large numbers of negatives in order to find a sample that has the desired properties, the 
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screening that would be necessary to make additional antibody species was not "xrndue 
experimentation." 

Analogously, practitioners of molecular biology for the instant invention also recognize 
that many rounds of screening may be necessary to identify and isolate single chain antibodies 
that specifically bind to phOx. However, the procedures for isolating such antibodies are 
widely accepted, routine protocols, not requiring "undue experimentation" to be practiced. 
Accordingly, one skilled in the art has sufficient guidance by the specification to practice the 
claimed methods without undue experimentation. 

A skilled artisan can practice the present invention using standard research techniques 

for isolating single chain antibodies with a particular binding specificity. The level of skill and 

knowledge in the art is exemplified by patents, that were filed before the filing date of the 

instant application, claiming an expression vector comprising a DNA sequence encoding a 

single chain antibody. For example, in U.S. Patent No. 6,017,754, claim 1 reads in part: 

1 , A eukaryotic expression vector . . . comprising: 

a first DNA sequence encoding an anti-hapten single-chain antibody, 
which antibody binds to a specific hapten, wherein said hapten is 4- 
ethoxymethylene-2-phenyl-2-oxazolin-5-one (i.e., phOx).* 

Applicants respectfiiUy submit that the written disclosure of the instant application is 
supplemented by the knowledge held by one of ordinary skill in the art. The skilled artisan is 
one who is knowledgeable about basic laboratory/research protocols. It is well settled law that 
an Applicant need not include disclosure that was well known in the art. Furthermore, in 
addition to the knowledge held by the skilled artisan. Applicants provide exemplary basic 



* U.S. Patent No. 6,017,754, column 33, lines 40-44. 
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regarding a single chain antibody that can be used in the methods of the invention (i.e., one that 
specifically binds to phOx). Applicants respectfully submit that the application, at the time of 
filing, taught one of skill in the art how to practice the claimed method. 



In addition. Applicants have amended claim 77 to recite a single chain antibody that 
comprises: 

a) the amino acid sequence set forth in SEQ ID N0:2; 

b) the amino acid sequence set forth in SEQ ID N0:2 with up to 30 
conservative amino acid substitutions; 

c) an amino acid sequence at least 95% identical to SEQ ID N0:2; 

d) an amino acid sequence encoded by the nucleic acid sequence set forth 
in SEQ ID NO: 1; or 

e) an amino acid sequence encoded by a nucleic acid sequence at least 
95% identical to SEQ ID N0:1. 

Applicants have also added new claim 81 which recites a binding partner encoded by: 

a) a nucleic acid sequence comprising SEQ ID NO: 1 ; 

b) a nucleic acid sequence at least 95% identical to SEQ ID NO: 1 ; 

c) a nucleic acid sequence encoding a polypeptide consisting of the amino 
acid sequence set forth in SEQ ID N0:2 with up to 30 conservative 
amino acid substitutions; or 

d) a nucleic acid sequence encoding a polypeptide consisting of an amino 
acid sequence at least 95% identical to SEQ ID N0:2. 

Support for the new claims can be found beginning at page 11, line 11, bridging to page 13, 
line 23. While the Office Action has not rejected claim 77 as amended, or new claim 81, the 
Office Action does assert that the specification provides insufficient guidance as to which 
amino acids, and the corresponding nucleotides within the full length sequence of SEQ ID 
N0:1, can be modified such that the resulting single chain antibody maintains the same 
binding specificity as the antibody encoded by SEQ ID N0:1. To support this assertion, the 
Examiner cites three references: Skolnick et al, Ngo et al, and Abaza et al. The Examiner 
appears to take the position that these references demonstrate that even a single amino acid 
substitution or 'conservative' amino acid substitution in a protein will often dramatically affect 
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the biological activity and characteristics of a protein, and concludes that undue 
experimentation would be required to enable the full scope of the claims. Applicants 
respectfully disagree. 

Applicants agree that it is possible, at least in some cases, to abolish activity of a given 
protein by mutating a critical residue, as disclosed by the cited references. However, 
applicants disagree that this fact means that one of ordinary skill cannot make functional 
analogs of SEQ ID N0:2 without undue experimentation. In support of this. Applicants 
provide EXHIBIT A (Bowie et al., Science 247:1305). Bowie et al. teaches, at page 1306, 
col.2, lines 12-13, that "proteins are surprisingly tolerant of amino acid substitutions." Bowie 
et al. cites as evidence a study carried out on the lac repressor. Of approximately 1500 single 
amino acid substitutions at 142 positions in this protein, about one-half of the substitutions 
were found to be "phenotypically silent": that is, had no noticeable effect on the activity of the 
protein (page 1306, col. 2, lines 14-17). Presumably the other half of the substitutions 
exhibited effects ranging from slight to complete abolishment of repressor activity. Thus, one 
can expect, based on Bowie et al.'s teachings, to find over half (and possibly well over half) of 
random substitutions in any given protein to result in mutated proteins with full or nearly full 
activity These are far better odds than those at issue in In re Wands , 858 R2d 731 (Fed. Cir. 
1988), in which the court said that screening many hybridomas to find the few that fell within 
the claims was not undue experimentation. The question is not whether it is possible to aboUsh 
activity with a modification such as a point mutation, but rather whether one of ordinary skill 
can produce, without undue experimentation, modified single chain antibodies in which the 
activity of specifically binding to phOx is not abolished. Based on Bowie et al.*s teachings, 
one would predict that even random substitution of residues in SEQ ID N0:2 will predictably 
result in a majority of the modified antibodies having full or partial phOx binding activity. 

In view of the amendments to the claims, and in light of the above discussion, 
Applicants request withdrawal of the rejection of claims 1-2, 8-18, 60, 64-65, 75 and 79-80 
under 35 U.S.C. § 112, first paragraph. 
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The Rejection under 35 U.S.C. S 112. First Paragraph, Written Description 

Claims 1-3, 5-6, 8-18, 60, 63-65, 75-76 and 79-80 stand rejected under 35 U.S.C. 
§ 112, first paragraph as allegedly not adequately described by the specification. This rejection 
is moot with regard to canceled claims 3, 5-6, 63 and 76. Applicant respectfully traverses the 
rejection as it may apply to the amended claims. 

The Office Action alleges that because the specification discloses only one single chain 
antibody encoded by the nucleic acid sequence set forth in SEQ ID NO: 1, it does not 
sufficiently describe methods that include any single chain antibody that binds to any ligand 
(citing to University of California v. Eli Lilly and Co., 43 USPQ 2d, 1398; and University of 
Rochester v. GZ). Searle & Co,, 69 USPQ2d 1886). As previously noted, claims 1 and 60 have 
been amended to recite a single chain antibody that specifically binds to phOx. In addition, 
claims 1 and 60 have been amended to recite a ligand comprising phOx. In view of the support 
for the amended claims provided in the specification, Applicants maintain that the claimed 
methods should not be limited to solely to the use of a single chain antibody having the amino 
acid sequence set forth in SEQ ID NO:2. 

The present disclosure provides working examples of methods for localizing a probe 
that include a single chain antibody that binds phOx. While the claims as amended encompass 
the use of a genus of single chain antibodies that bind to phOx, Applicants note that the law 
does not require that the specification describe every species within the genus. As described 
above, Applicants have provided at least one amino acid sequence of a member of the genus of 
single chain antibodies used in the claimed methods. Applicants have fiirther provided relevant 
identifying characteristics of the genus encompassed by the claims (i.e., a single chain antibody 
that binds to phOx). Accordingly, Applicants have clearly demonstrated that they were in 
"possession of the necessary common attributes of features of the elements possessed by 
members of the genus" (66 Fed. Reg. 1099, at 1106) as of the filing date of the appUcation. 

In simmiary, Applicant respectfully requests withdrawal of the rejection of the claims 
under 35 U.S.C. § 112, first paragraph as allegedly not adequately described by the 
specification as filed. 
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The Rejection under 35 U.S.C. S 103 

Claims 60 and 63 stand rejected under 35 U.S.C. § 103(a) as allegedly unpatentable 
over U.S. Pat. No. 6,017,754 (referred to herein as "the 754 patent") in view of Haugland et al. 
(Handbook of Fluorescent Probes and Research Chemicals 6th edition, pages 13-15, 18-19 
(1996)) and WO 93/1 1 120. This rejection is moot with regard to canceled claim 63. 
Applicants traverse this rejection as it may apply to the amended claims. 

Applicants note that claim 60 has been amended to recite a single chain antibody that 
specifically binds to phOx. Li addition, claim 60 has been amended to recite a ligand 
comprising phOx. Finally, claim 60 has been amended to recite the step of "detecting the 
probe/ligand conjugate within the cell, thereby localizing the probe within the cell." The 
combination of the 754 patent and the cited secondary references, does not resuU in a method 
for localizing a probe within a cell, that includes a membrane permeant conjugate for detecting 
a single chain antibody or a specific binding pair member expressed fi:om a recombinant 
nucleic acid, as recited in claim 60. 

In view of the fact that none of the cited references, alone or in combination, teach or 
suggest a method for localizing a probe inside a cell, Applicants request withdrawal of this 
rejection. 

Claims 1-2, 5-6, 8, 11-14, 16-17, 64-65 and 75-76 stand rejected under 35 U.S.C, 
103(a) as allegedly unpatentable over the 754 patent (US Pat No. 6,017,754) in view of 
Schouten et al, Haugland et al. (Handbook of Fluorescent Probes and Research Chemicals 6th 
edition, 1996, pages 13-15, 18-19) and WO 93/11120. This rejection is moot with regard to 
canceled claims 5-6 and 76. Applicant respectfiilly traverses the rejection as it may apply to 
the amended claims. 

The Office Action alleges that the 754 patent teaches a method of identifying and 
selecting a cell to study genes of interest at a cellular level by transfecting the cell with a 
plasmid that encodes a single chain antibody (sFv) directed against phOx. The Office Action 
fiuther asserts that the 754 patent teaches that the hapten (phOx) as the ligand can be 
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conjugated to a fluorescent (FITC) spectroscopic probe or other label via a linker moiety 

(phOx-BSA-FITC) to allow for identification and selection of the transfected cell by detecting 

fluorescence emission (citing column 7, line 8-13 of the 754 patent). Schouten et al has been 

added to the list of references cited in previous Office Actions. Schouten allegedly teaches a 

method of targeting a single chain antibody to various subcellular locations by fusing various 

targeting signals to the antibody. Haugland et al. allegedly teach spectroscopic probes that are 

membrane permeant, including BODIPY FL. WO 93/1 1 120 allegedly teaches a flexible 

aliphatic linker that is membrane permeant. Based on these assertions, the Office Action 

concludes that it would have been obvious to one of ordinary skill in the art at the time the 

invention was made to combine 1) the method of using a phOx-binding single chain antibody 

to identify and isolate cells, as taught in the '754 patent; 2) the single chain antibody 

containing a subcellular localization signal, as taught by Schouten; 3) the impermeant 

linker/probe conjugate taught by Haugland; and 4) the linker taught by Haugland or by WO 

93/111020, to arrive at the presently claimed method of localizing a signal inside a cell. 

When a rejection depends on a combination of prior art references, there must be some 

teaching, suggestion, or motivation to combine the references. The Court of Appeals for the 

Federal Circuit has restated the general principle that hindsight analysis cannot be a basis for 

an obviousness rejection: 

The [Patent Office] did not, however, explain what specific 
understanding or technological principle within the knowledge of one of 
ordinary skill in the art would have suggested the combination [of 
references cited]. Instead, the [Patent Office] merely invoked the high 
level of skill in the field of art. If such a rote invocation could suffice to 
supply a motivation to combine, the more sophisticated scientific fields 
would rarely, if ever, experience a patentable technical advance. Instead, 
in complex scientific fields, the [Patent Office] could routinely identify 
the prior art elements in an application, invoke the lofty level of skill, and 
rest its case for rejection. To counter this potential weakness in the 
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obviousness construct, the suggestion to combine requirement stands as a 
critical safeguard against hindsight analysis and rote application of the 
legal test for obviousness.^ 

In the instant OflTice Action, the Patent Office fails to sufficiently explain what specific 
understanding or technological principle within the knowledge of one of ordinary skill in the 
art would have suggested the combination of the four cited references to arrive at Applicants 
claimed invention. Simply stating that one of ordinary skill in the art would be motivated to 
make the combination because: 1) single chain antibodies are encoded by small nucleic acid 
coding sequences; 2) targeting signals are useful for targeting fusion polypeptides to a 
particular cellular location; 3) cell permeant probes are known to the skilled artisan; and 4) 
flexible linkers are also known to the skilled artisan, fails to articulate how the references 
suggest to the skilled artisan that the combination would result in the presently claimed 
method. 

The Examiner seems to be suggesting that the cited references demonstrate that the 
invention "could" have been made by one skilled in the art. However, this is not the test of 
obviousness. To be obvious, an invention must be somehow "taught" by the prior art. In this 
case, none of the references disclose or suggest a method for localizing a probe within a cell, 
that includes a membrane permeant conjugate for detecting a single chain antibody or a 
specific binding pair member expressed from a recombinant nucleic acid, as recited in the 
pending claims. The Examiner has the burden of explaining how the prior art suggests the 
claimed subject matter and not simply the general aspects of the invention (e.g., expression of a 
single chain antibody in a cell, the use of targeting signals, etc.). It is unclear how the 
references, even if properly combined, render the claimed invention obvious. Accordingly, 
Applicants request withdrawal of this rejection. 



^ In re RoufTet 149 F.3d 1350 (Fed. Cir. 1998). 
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The Office Action rejects claim 9 (see page 16, part 12 of the Office Action), claim 10 
(see page 17, part 13 of the Office Action), claims 15 and 79-80 (see page 18, part 14 of the 
Office Action), and claim 18 (see page 20, part 15 of the Office Action) under 35 U.S.C. 103(a) 
as being unpatentable over the various references. Applicants respectfully traverse these 
rejections. 

Claims 9, 10, 15, 18 and 79-80 ultimately depend fi-om independent claim 1. As 
discussed above, AppUcants believe that amended claim 1 is nonobvious. Applicants submit 
that if an independent claim is nonobvious under 35 U.S.C. §103, then any claim depending 
therefrom is nonobvious. In re Fine, 837 R2d 1071, (Fed, Cir. 1988); MPEP §2143.03. 
Accordingly, Applicants request withdrawal of these rejections imder 35 U.S.C. 103(a). 

In view of the amendments to the claims and the above remarks, reconsideration and 
favorable action on all claims is respectfully requested. Should any questions remain in view 
of this communication, the Examiner is encouraged to call the undersigned so that a prompt 
disposition of this application can be achieved. Please charge any additional fees, or make any 
credits, to Deposit Account No. 50-1355 . 



Respectfully submitted. 



Date: Julv28, 2004 




Michael Reed, J.D., Ph.D. 
Reg. No. 45,647 
Applicant's Representative 
Telephone: (858) 638-6754 
Facsimile: (858) 677-1465 
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Deciphering the Message in Protein Sequences: 
Tolerance to Amino Acid Substitutions 
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a amino add sequence encodes a message that deter- 
ines the shape and function of a protein. This message is 
ghiy d^enerate in that many different sequences can 
de for proteins with essentially the same structure and 
tivity. Comparison of different sequences with similar 
essages can reveal key features of the code and improve 
iderstanding of how a protein folds and how it per- 
rms its function. 



rHE GENOME IS MANIFEST LARGELY IN THE SET OF PRO- 
tcins that it encodes. It is the ability of these proteins to fold 
into unique three-dimensional strucmres that allows them to 
iction and carry out the instructions of the genome. Thus, 
nprehending the rules that relate amino acid sequence to struc- 
t is fundamental to an understanding of biological processes, 
cause an amino add sequence contains all of the information 
rcssary to determine the structure of a protein (^), it should be 
ssiblc to prcdia structure hom sequence, and subscqucndy to 
ZT detailed aspects of function from the structure. However, both 
)blcms arc extremely complex, and it seems unlikely that either 
1 be solved in an exaa manner in the near future. It may be 
»ible to obtain approximate solutions by using experimental data 
simplify the problem. In this ardcic, wc describe how an analysis 
allowed amino add subsdtutions in proteins can be used to 
uce the complexity of sequences and reveal important aspects of 
icturc and fiincdon. 



ethods for Studying Tolerance to 
quence Variation 

licre are two main approaches to studying the tolerance of an 
ino add sequence to change. The first method relies on the 
cess of evolution, in which mutadons are cither accepted or 
ctcd by natural selection. This method has been extremely 
verful for proteins such as the globins or cytochromes, for which 
ucnces from many diflfercnt spedes are known (2-7). The second 
•roach uses genedc methods to introduce amino add changes at 
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specific posidons in a doned gene and uses selections or screens to 
identify fimcdonal sequences. This approach has been used to great 
advantage for proteins that can be expressed in bacteria or yeast, 
where the appropriate genedc manipuladons are possible {3, i-ll). 
The end results of both methods are lists of acdve sequences that can 
be compared and analyzed to idcndfy sequetKe features that are 
essential for folding or fimcdon. If a particular property of a side 
chain, such as charge or size, is important at a given posidon, only 
side chains diat have the required property will be allowed. Con- 
versely, if the chemical idcndty of the side chain is unimportant, 
then many different subsdtudons will be permitted. 

Studies in which these methods were used have revealed that 
proteins arc surprisingly tolerant of amino add subsdtudons (2-4, 
11), For example, in studying the eficcts of approximatdy 1500 
single amino add subsdtudons at 142 posidons in be repressor, 
Miller and co*workers found that about one-half of all subsdtutions 
were phenotypically silent (11), At some posidons, many different, 
nonconscrvative subsdtudons were allowed. Such residue positions 
play litdc or no role in structure and function. At other posidons, no 
substitutions or only conservative substitutions were allowed. These 
residues are the most important for lac repressor activity. 

What roles do invariant and conserved side chains play in 
proteins? Residues that are direcdy involved in protein fUnctions 
such as binding or catalysis will certainly be among the most 
conserved. For example, replacing the Asp in the catalytic triad of 
trypsin with Asn results in a 10^-fold reduction in activity (12), A 
siniilar loss of activity occurs in X repressor when a DNA binding 
residue is changed from Asn to Asp (13). To carry out their 
function, however, these catalytic residues and binding residues 
must be precisely oriented in three dimensions. Consequently, 
mutations in residues that are required for structure formation or 
stability can also have dramatic eSixts on activity (10, 14-16), 
Hence, many of the residues that arc conserved in sets of related 
sequences play structural roles. 



Substitutions at Surface and Buried Posidons 

In their initial comparisons of the globin sequences, Perutz and 
co-workers found that most buried residues require nonpolar side 
chains, whereas few features of surface side chains are generally 
conserved (6), Similar results have been seen for a number of protein 
families (2, 4, 5, 7, 17, 18), An example of the sequence tolerance at 
sur&ce versus buried sites can be seen in Fig. 1, ^^ch shows the 
allowed substitutions in X repressor at residue positions that are near 
the dimcr interface but distant from the DNA binding sur&ce of the 
protein (9), These substitutions were identified by a functional 
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Rg- 1. (A) Amino acid substitutions iliowed in a 
.Short region of X repressor. The wild-type sc- 
quoKc IS shown along die ccnrer line. iSc al- 
towed subsatutions shown above each position 
were itorified by randomly mutating one to 

thR« codora at a time by using a cassette method 
and applying a fimcnonaJ selection (9). (B) The 
fi^onal solvent accessibility (42) of the wild- 
type side Cham m die protein dimer (43) relative 
to the same atoms in an Ala-X-AIa model tripep^ 
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selection after cassette mutagenesis. A histogram of side chain 

s^i^ iTS*;'"!: strucnT^e dimer is aS^ 
^owi m F,g. 1. At su positions, only the wiid-type residue or 

^l."T^T^^'^^'^ « allowed. R^e of A.^ 
P«mons are buned u, the proteia In contrast, most of the hi^ 
^ posmons tderate a wide range of chemically diff««S 
^ uidudmg hydrophilic and hydrophobic «sid\«s. Hc^it 
secm^ d«t most of the structural infonnation in diis reeion ofdie 
protem « carried by die residues that a« solvent inaccSe 

Constraints on Core Sequences 

farprotem folding or stabJity, we must understand the fectS diat 

only hydrophobK or neutral lesidues are tolerated at buriedSS 
protons, undoubt^ becauseof d.e large favorable con^^S rf 
the hydrophobic effea to protein stability (19). For example, 2 

S! rlLT^' hylophobic ^ of 

tlie NHrtetminal domain of X reprwsor (20). TTic aoentable core 
sequorices art composed almost exclusively of Ala. C^^V^ T 
^ Met, and Phe. The accepubility ofLny diflS^^wt « 
^co« position p«sun«bly reflects the fea dut Ae hydropSobk 
effect, unhke hydrogen bonding, does not depend oTSc 
residue painngs. AldK)ugh it is Jwsible to imSTa hJJodS 
core structu^ that is subilized'^dusively^^KS^ 
^l^"" ^.Wdges, such a cLIlouId ^iStybf 
d^cuh to construa because hydrogen bonds require^palring S 
donors and acceptors in an cxaa geometry. Thus die reSSx S 
P<«sibfe strucmres dut use a polar core Wdd probably^TS^ 

If P^'^^™ ^« their hydrog^ 
bonding needs can be satisfied (22). "/"^gc" 

vnS!l"!r ^"^^ "l"^" ^°*^y P><*«1 (23), but some 
volume of acceptable sequences can vaiy by about 10% QaJ^t 
mdividual sires, however, can be coScSIb?^ FoTS " 

Sirs aii2;jx.jrs;r^^ 

position m the appropnare sequence contexts. Lar« volume 
changes at mdividual buried ^^es have also becn^c:^^^ 
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Position 

phylogenctic studies, where it has been noted diat die size decreases 
and increases at mreracting residues are not necessarily relaStal 
»np^complemen«ry ftshion (5. 7. 17). Rad«r. 
stJd^i^ a«onunodated by conformational chaiiges m ncZ 
side chams and by a variety of backbone movements ^ 

The Informational Importance of the Core 

anSi„?^'°'^ «ceptions, die core must remain hydrophobic 
IS composed of side chams diat can assume only a limhed numbo^ 

stenc clashes. How unpottant are hydrophobidty, volume, and 
^cc^mplementantyindererniiningwhedKragiv^ 
fom,« acceptable core? Each fector is essential £ a ph^cal ^ 
«a stable core is probably unable to tolerate unsati^^ hydS 
^g.groups, large holes, or stericoveriaps (25). HcmcvLi^ 
mforniational sensc^ these factors are noteqXient For^pfc " 
«Pcnments in which duec core residuL of X ttp^^M^ 
n^^smiuhaneousfy, vohune was a relatively l^^J 

rf th^^ threcHjuarttrs of i possSe conlS- 

nons of die 20 naturally occurring amino adds hadrolumes wirfiin 

unacceptable (20). In contrast, of die sequences diat coliiaincd only 



2. Amino add substitu- 
tions allowed in the core of X 
ryrcssor. The wild-type tide 
chains arc shown pictorially in 
the a^roximaic orientation 
seen m the aystal structure 
{4J), The lists of allowed sub- 
^rions at each position are 
«f>own below the wild-type 
side chains, TTiese substitu- 
QWtt were identified by ran- 
domly mutating one to four 
residues at a time by using a 
cassette method and a^^g 
a fimctional selection (20), 
Not all substitutions ate al- 
lowed in every sequence back- 
ground. 



65 



-V5 



40 36 4 7 65 51 57 18 

*!» *» iL Cf, ji 

vir ^' ^ S S ^ 

to V- Vri Sir Z 0? )S 

«*J «• II* Val M«i Uu ul 

is la ^ i^JJ 

Phe Phd r.!«t 



the appropriate hydrophobic residues, a significant fiaction were T«,«I.V„«,- r o 

acceptable. Hence, Ae hydrophobiciV of a seque^^conS W^lCatlOnS for Structure Prediction 
more mformation about its potential acceptability in the core than 
docs the total side chain volume. Steric compatibility was interaicdi- 
ate between volume and hydrophobicity in informational impor- 



At present, the only reliable method for predicting a low- 
tesoludon tcmajy structure of a new protein is by identifying 
sequence smulanty to a protein whose stmcture is afready knS 
{29. 30). However, .t b often difficult to align sequences as the level 
of sequence sumlarity decreases, and it is impossibleTo 

ll.e Infonnational Ixnportance of Surface Sites ^^^pA^^^e^X'^Sl^^^^ 

f"'^^ *^ "r^"" « ^ be advanta- 

geous to mcr«se Ae reach of die available structural infomution by 
unprovmg methods for detecting distant sequence relations a^d £ 
uteequendy ahgrng diese sequences basid on structural principte 
In a normal homology search, the sequence database is scanned ^th 
a smgle test sequence, and every residue must be weighted Sy 

^eZSr? Moreover, certain regions of d,e protein 

Z^r^ ^ ^ kin''* of ir£,rma- 

tion can be obtained fiom sequence sets, and several techniques have 



We have noted d«t many surface sites can tolerate a wide variety 
ofs.de chains including hydrophilic and hydrophobic residues. ThX 
t«uk might be taken ro indicate that surface pS^idons cont^ S 
^aural infonnanon. However, Bashford « i„ an extensive 
analysis of globm sequences (4), found a strong bias against large 
hydrophobic residues at many surface positions At on! level ti^s 
may reflect consttamts imposed by protein solubility, because 'large 

aggregation. At a more fundamental level, proteiA folding rLuins a 
partmoningbctweensurfaceand buried positions. Con4ucndy to 
aAicve a umque native sute wiAout significant com^tion from 

2 'bT^r hydrophobic r«idues individ- 

uaUy, but the surface as a whole can probably tolerate only a 
moderate number of hydrophobic side chains. ""'X » 

Identification of Residue Roles from 
oets of Sequences 

Often, a protein of interest is a member of a family of related 
^^uences. What can we infer fiom die pattern of allowed luS 
no,, a positions in sec of aligned sequences generated by gcn^c 
or phylogenetic mediods? Residue positions diat can alepra 
number of different side chains, including charged and^g^^'^ J 
^ dues, are almost certain to be on die protein surface Residue I 
S°^^y, ^^^-.^r^^Phobic, whether variable or r^X 
likely to be buned widim die structure. In Fig. 3, tfK«c residue 
P^moi^ in X repressor diat can accept hydropf ilic "side c^' t 

Z T^' '"'^ "^'P' hydrophilic side 

Chans are shown m green. The obligate hydrophobic positions 

Hr^. ^'"^ position? diat cTa^p 

hydrophilic side chains define die surface ^ 

J™ 'T^""' ''^''^ ^ in sets of 

s^«3.?'"''"""rP^''^'"°'*^"'''^''«beraside chain 

unserved To make diis distinction requires an independent assay of 

^ ^■"^'^ ^ "^"^ ^ biophysical techniques 
y smcepnbility to mtraaUular proteolysis (2(Q. or by bindiTS 

-hty to fold even if diese proteins are inactive. Sets of sequences 
.at allow formation of a stable structure can then be comS ro 

ndmg residues bemg diose diat are variable in die set of srable 
but uivariant in die set of functional proteins.Te dSa- 

nSI "^ •'^"'r "^^"^ were also 

•ntified by comparmg die stabUities and activities of a set ^ 

Znt ; "^"T"! '^"^ 'Elated 
mones widi different bmding spedfidrics. 
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been used to combine such information into more appropriately 
, weighted sequence searches and alignments (31). These methods 
were used to align die sequences of retroviral proteases with aspardc 
proteases, which in turn aUowed construction of a three-dimension- 
al model for the protease of human immunodeficiency virus type 1 
(29). Comparison with the rccendy determined oystal structure of 
this protein revealed reasonable agreement in many areas of the 
prcdiacd structure (32). 

The structural information at most surfoce sites is highly degener- 
ate. Except for functionally important residues, exterior positions 
seem to be important chiefly in maintaining a reasonably polar 
surfece. The infonnarion contained in buried residues is also 
degenerate, the main requirement being diat these residues remain 
hydrophobic. Thus, at its most basic level, die key structural 
message in an amino add sequence may reside in its specific pattern 
of hydrophobic and hydrophilic residues. This is meant in an 
informational sense. Clearly, die precise structure and stabiHty of a 
protcm depends on a large number of detailed interactions. It is 
possible, however, rfut stmctural prediction at a more primitive 
lewl carl be accomplished by concentrating on die most basic 
mfonnational aspects of an amino add sequence. For example, 
amphipadiic patterns can be eactractcd fix>m aUgned sets of sequences 
and used, m some cases, to identify secondary structures. 

If a region of secondary structure is packed against die hydropho- 
bic core, a pattern of hydrophobic residues reflecting die pcriodidty 
of die secondary structure is cxpcacd (33, 34). These patterns can be 
obscured in mdividual sequences by hydrophobic residues on die 
piotcui surface. It is rare, however, for a surface position to remain 
hydrophobic over the course of evolution. Consequcndy, die am- 
phipadiic patterns expected for simple secondary structures can be 
mudi dearer in a set of related sequences (S). This prindple is 
mustrated ui Fig. 4, which shows hcUcal hydrophobic moment plots 
for die Antennapedia homeodomain sequence (Fig. 4A) and for a 
composite sequence derived from a set of homologous homeodo- 
main proteins (Fig. 4B) (35). The hydrophobic moment is a simple 
measure of die degree of amphipathic charaaer of a sequence in a 
gvcn secondary structure (34). The amphipadiic character of die 
dirce a-helical regions in die Antennapedia protein (36) is dcariy 
revealed only by die analysis of die combined set of homeodomain 
sequences. TTie secondary structure of Arc repressor, a small DNA- 
bmding protein, was recendy predicted by a similar mediod (S) and 
confirmed by nudear magnetic resonance studies (37). 

The specific pattern of hydrophobic and hydrophilic residues in ' 
an amino add sequence must limit die number of different structures 
a given sequence can adopt and may indeed define its overaU fold. If 
diis IS true, dicn die arrangement of hydrophobic and hydrophilic 
residues should be a characteristic feature of a particular fold. Sweet 
and Eisenberg have shown diat die correlation of die pattern of 
^drophobxaty between two protein sequences is a good criterion 
for dieir structural relatedness (38). In addition, several studies 
mdicatc diat patterns of obUgarory hydrophobic positions identified 
from aligned sequences are distinctive fcanires of sequences diat 
adopt die same structure (4, 29, 38, 39). Thus, die order of 
hyto)phobic and hydrophilic residues in a sequence may acnially be 
suffiaent infonnarion to determine die basic folding pattern of a 
protein sequence. 

Aldiough die pattern of sequence hydrophobidty may be a 
diaractcnsGc feature of a particular fold, it is not yet dear how such 
patterns could be used for prediction of structure de novo. It is 
impOTtant to understand how patterns in sequence space can be 
related to structures in conformation space. Lau and Dill have 
approadied diis problem by studying die properties of simple 
sequences composed only of H (hydrophobic) and P (polar) groups 
on two-dimensional lattices (40). An example of sudi a represcna- 



aon is shown in Fig. 5. Residues adjacent m die sequence must 
occupy adjacent squares on die lattice, and two residues cannot 
occupy die same space. Free energies of particular conformations are 
evaluated widi a single term, an attraction of H gwrnps By 
considcnng chains of ten residues, an exhaustive confonmrionid 
search for all 1024 possible sequences of H and P residues was 
possible. For longer sequences only a representative fraction of die 
allowed sequence or conformation space could be cxpbrcd ITic 
significant results were as follows: (i) not all sequences can fold into 
a "nauve" structure and only a few sequences form a unique native 
strucnire; (ii) die probability diat a sequence wiU adopt a unique 
native strucnire increases widi chain lengrii; and (iii) die native 
states are compaa, contain a hydrophobic core surrounded by polar 
residues, and contain significant secondary structure. Aldiough die 
gap between diese two^ensional simulations and dircc-dimcn- 
sional sttuctures is large, die use of simple rules and sequence 
representations yields results similar to diose expected for real 
proteins. Three-dimensional lattice mediods are also beginning to 
be developed and evaluated (41). 



Summary 

There is more information in a set of related sequences dian in a 
single sequence. A number of practical applications arise from an 
analysis of die tolerance of residue positions to change. First, such 
mfomiation permits die evaluation of a residue's importance ro die 
function and stability of a protein. This ability to identify die 
essential dements of a protdn sequence may imprtjve our under- 
standing of die determinants of protein folding and stability as well 
as protein function. Second, patterns of tolerance to amino add 
suteotunons of varying hydrophilicity can hdp to identify residues 
Ukdy to be buried in a protein structure and diose likely to occupy 
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Fig. 4. Hdical hydro- 
phobic moments calcu- 
lated by using (A) the 
Antennapedia homeodo- 
main sequence or (B} a 
set of 39 aligned homeo- 
domain sequences (3S). 
The bars indicate the ex- 
tent of the hdical re- 
gions identified in nude- 
ar magnetic resonance 
studies of the Antenna- 
pedia homeodomain 
(J6). To detennine hy- 
drophobic moments, 
residues were assigned 
to one of three groups: 
HI (high hydro^bid- 
ty = Trp, He, Phe, Leu. 
Met. Val, or Cys); H2 
(medium hydrophobic- 
ity = Tyr,PrD,Ala,Thr, 
Gin, Asn, Glu, Asp, Lys, 
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His^Gty, or Ser); arid H3 (low hydrophobidty - s,«, ^.i, v^iu, /isp, i^ys, 
or Aig). For die aUgned homeodomain sequences, die residues at each 
^^^^^Jf^T^ ^ '^"^ »»y«*rophobidty by using die scale of Faudiere 
and Plis^(45). Arg and Lys were not counted unless no odier residue was 
tound at die posioon, because diey contain long aliphatic side chains and can 
tticr^ suhsunitc for nonpolar residues at some buried sites. To account for 
possible sequence errors and rare cxcepdons, die most hydrophilic residue 
allowed at eadi posidon was discarded unless it was observed twice. The 
scomd most hydrophilic residue was dicn chosen to represent die hydropho- 
?^ posioon. An dght-rcsidue window was used and die vectors 
pr^cctcd raAally every 100". The vector magnitudes were assigned a value of 
1, 0, or -I for posidons where die hydn>f>hobidty group was HI, H2, or 
H3, respectivdy. * ^ r » » 
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Bg. 5. A representation of one com- 
pact confbnnation for a particular 
sequence of H and P residues on a 
two-dimensional square lattice. 
[Adapted from (40), widi permis- 
sion of the American Chemical Soci- 
ety] 
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suxCtcc positions. The amphipathic patterns that emci^ can be used 
to identify probable regions of secondary structure. Third, incorpo- 
rating a knowledge of allowed substitutions can improve the ability 
to dctca and align distandy related proteins because die cssendai 
residues can be given prominence in the alignment scoring. 

As more sequences arc determined, it becomes increasingly likely 
diat a protein of interest is a member of a fimiily of related 
sequences. If this is not the case, it is now possible to use genetic 
mcdiods tt> generate lists of allowed amino add suhsriturions. 
Consequcndy, at least in the short tenm, it may not be necessary to 
solve die folding problem for individual protein sequences. Instead, 
information from sequence sets could be used. Perhaps by simplify- 
ing sequence space duough die identification of key residues, and by 
simplifying confomwtion space as in Ac lattice mediods, it will be 
possible to develop algoriduns to gracratc a limited number of trial 
structures. These trial structures could then, in turn, be evaluated by 
further experiments and more sophisticated energy calculations. 
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